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Abstract. We study a weaker formulation of the nuUspace property which guarantees recovery of sparse 
signals from linear measurements by i\ minimization. We require this condition to hold only with high proba- 
bility, given a distribution on the nuUspace of the coding matrix A. Under some assumptions on the distribution 
of the reconstruction error, we show that testing these weak conditions means bounding the optimal value of 
two classical graph partitioning problems: the fc-Dense-Subgraph and MaxCut problems. Both problems admit 
efficient, relatively tight relaxations and we use semidefinite relaxation techniques to produce tractable bounds. 
We test the performance of our results on several families of coding matrices. 



1. INTRODUCTION 

Given a coding matrix A € R^^" and a signal e G R", we focus on conditions under which the solution 
xo to the following minimum cardinality problem 

xn = min. Card(x) ,„ , 

. ^ ' (^n-recov.) 
subject to Ax = Ae, v u / 

which is a combinatorial problem in x E R", can be recovered by solving 

x'P = min. Ilxlli ,„ 

. . (£i-recov.) 
subject to Ax = Ae, 

which is a convex program in x G R". Problem (i?o-recov.) arises in various fields ranging from signal 
processing to statistics. Suppose for example that we make a few linear measurements of a high dimensional 
signal, which admits a sparse representation in a well chosen basis (e.g. Fourier, wavelet). Under certain 
conditions, solving (^i-recov.) will allow us to reconstruct the signal exactly (Donoho, 2004; Donoho and 
Tanner, 2005; Donoho, 2006). In a coding application, suppose we transmit a message which is corrupted 
by a few errors, solving (li-recov.) will then allow us to reconstruct the message exactly (Candes and 
Tao, 2005, 2006). Finally, problem (^i-recov.) is directly connected to variable selection and penalized 
regression problems (e.g. LASSO Tibshirani (1996)) arising in statistics (Zhao and Yu, 2006; Meinshausen 
and Yu, 2008; Meinshausen et al., 2007; Candes and Tao, 2007; Bickel et al., 2007; Candes and Plan, 2009). 
Of course, in all these fields, problems (£o-recov.) and (£i-recov.) are overly simplified. In practice for 
example, the observations could be noisy, approximate solutions might be sufficient and we might have 
strict computational limits on the decoding side. While important, these extensions are outside the scope of 
this work. 

Based on results by Vershrk and Sporyshev (1992) and Affentranger and Schneider (1992), Donoho and 
Tanner (2005) showed that when the solution xq of (io-recov.) is sparse with Card(xo) = k and the 
coefficients of A are i.i.d. Gaussian, then w.h.p. the solution of the (convex) problem in (^i-recov.) will 
always match that of the combinatorial problem in (lo-recov.) provided k is below an explicitly computable 
strong recovery threshold ks- They also show that if k is below another (larger) weak recovery threshold 
kw, then these solutions match with an exponentially small probability of failure. 

Generic conditions for strong recovery based on sparse extremal eigenvalues, or restricted isometry prop- 
erties (RIP), were also derived in Candes and Tao (2005) and Candes and Tao (2006), who proved that certain 
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random matrix classes satisfied these conditions near optimal values of k with an exponentially small prob- 
ability of failure. Simpler, weaker conditions which can be traced back to Donoho and Huo (2001), Zhang 
(2005) or Cohen et al. (2009) for example, are based on properties of the nullspace of A. When the signal 
cardinality Card(e) < k and the Nullspace Property (NSP) holds, i.e. when there is a constant ak < 1/2 
such that 

\\x\\k,i < Ofclklli (det-NSP) 
for all vectors x S R" with Ax = 0, then solving the convex problem (£i-recov.) will recover the global 
solution to the combinatorial problem {(o-recow.). Condition (det-NSP) can be understood as an incoherence 
measure, i.e. it means that not all of the mass in x can be concentrated among only k coefficients, in other 
words: 

Good coding matrices have incoherent nullspace vectors. 
In particular, this condition means that the nullspace of A cannot contain sparse vectors. Furthermore, the 
constant at can be used to explicitly bound the reconstruction error when solving the £i -recovery problem 
in (^i-recov.). This is illustrated in Proposition 2.1 below, directly adapted from Cohen et al. (2009, Th. 
4.3). 

One fundamental issue with the sparse recovery conditions described above is that, except for explicit 
thresholds available for certain types of random matrices (with high probability), testing these conditions 
on generic (deterministic) matrices is potentially harder than solving the combinatorial ^o-norm minimiza- 
tion problem in (£o-recov.) for example as it implies either solving a combinatorial problem to compute 
in (det-NSP), or computing sparse eigenvalues. Recent results in Candes and Plan (2009) show that the tra- 
ditional (and tractable) incoherence conditions ensure recovery of sparse signals with high probability, given 
a uniform distribution on the signal. These incoherence conditions lack universality however, in the sense 
that contrary to the combinatorial conditions mentioned above, they cannot be used to guarantee recovery 
of all signals of near-optimal size k. Convex relaxation relaxation bounds were used in d'Aspremont et al. 
(2008) (on sparse eigenvalues), Juditsky and Nemirovski (2008) or d'Aspremont and El Ghaoui (2011) (on 
NSP) to test sparse recovery conditions similar to (det-NSP) on arbitrary matrices. Unfortunately, the per- 
formance (tightness) of these relaxations is still very insufficient: for matrices satisfying the sparse recovery 
conditions in Candes and Tao (2005) up to signal cardinality k* , these three relaxations can only certify that 
the conditions hold up to cardinality and are also likely to provide poor bounds on reconstruction error. 

In what follows, we seek to enforce a weaker version of condition (det-NSP). We will bound the inco- 
herence measure in (det-NSP) with high probability over a random sample of vectors in the nullspace 
of A. Another way to look at this approach is to remember that, if x^p solves the £i -decoding problem 
in (^i-recov.), the vector x'p — e is always in the nullspace of A and Proposition 2.1 below shows that en- 
forcing condition (det-NSP) on the reconstruction error x^p — e allows us to bound the magnitude of this 
error. 

Here, because we cannot efficiently test condition (det-NSP) over all vectors in the nullspace of A, we 
will instead require condition (det-NSP) to hold only with high probability on the nullspace of A, given a 
distribution on this subspace. Let us assume for simplicity that Rank(yl) = q, and let F G r^x™ y^ith 
m = n — g be a basis for the nullspace of A (not necessarily orthogonal or normalized). We will require 
that the NSP condition (det-NSP) discussed above, which reads 

\\Fy\\k,i < ak\\Fy\\i (proba-NSP) 

be satisfied with high probability, given a distribution on y. We will start by assuming that y is Gaussian. In 
this case, we will see that both sides of condition (proba-NSP) can be explicitly controlled by the solution of 
classic graph partitioning problems. These combinatorial problems admit tight, efficiently computable ap- 
proximations which will allow us to bound the probability that (proba-NSP) holds. We will then extend these 
results to more general distributions on the nullspace and show that the same quantities which controlled 
concentration in the Gaussian case, also control fluctuations in the more general model. 

Of course, assuming the true distribution on the signal e is either sparse or follows a power law, our 
simple model on the nullspace of A error could have zero measure with respect to the true (structured) dis- 
tribution of x^P — e. In fact, at first sight, we are implicitly positing a model on the reconstruction error, then 
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ultimately use the model to bound this same reconstruction error, an apparent circular reference. Our main 
objective however is not to directly bound the error but rather to isolate efficiently computable quantities 
which will be good proxies for this error, sacrificing some statistical accuracy in favor of computational 
efficiency. Moreover, our main result is to efficiently approximate the Lipschitz constants of the two norms 
in (proba-NSP), constants which are likely to play a critical role whatever the model on the reconstruction 
error. 

Current results in compressed sensing provide universal recovery guarantees using intractable conditions 
(which can only be tested with high probability on random matrices). Our objective here is to do the 
opposite and isolate tractable measures of performance that can be computed on arbitrary matrices, even 
if this means losing some confidence in our signal recovery guarantees. Numerical experiments detailed at 
the end of this work, using simple models for e, seem to suggest that our assumptions on x^^ — e we. not 
completely unreasonable (cf. Figure 1). Furthermore, the fact that the true signal e is inherently structured 
means that, in principle, these statistical fidelity questions would arise with any model on e. 

Our contribution here is twofold. First, assuming a Gaussian model or bounded independent model on 
the nullspace of the matrix A in {iQ-recov.), we show that testing if the NSP condition (proba-NSP) holds 
with high probability amounts to bounding the value of two classic graph partitioning problems: MaxCut 
and /c-Dense-Subgraph. Second, we show new approximation results for semidefinite relaxations of the 
A;-Dense-Subgraph problem when the graph weight matrix is positive semidefinite but has coefficients of 
arbitrary sign. This result has applications outside of the compressed sensing context discussed in this 
paper, and is directly related to correlation clustering for example. Solving a /c-Dense-Subgraph problem 
on a (positive semidefinite) correlation matrix (modeling similarities between variables) isolates a highly 
correlated A;-cluster of variables. Here, we use these approximation results to show that our weak recovery 
conditions can be certified in polynomial time for arbitrary matrices even when the target cardinality k is 
near the true recovery threshold k* (i.e. a log term away). 

The paper is organized as follows. Conditions for sparse recovery with high probability, given a model on 
the nullspace of the sampling matrix are derived in Section 2. The performance of these conditions and links 
with the restricted isometry property are discussed in Section 3. Section 4 derives semidefinite relaxations 
and approximation results for the graph partitioning problems used in testing these weak recovery condi- 
tions. Section 5 brielfy discusses the complexity of solving these relaxations. Section 6 shows that these 
approximation results allow us to certify weak recovery for near optimal values of the signal cardinality k. 
Finally, we present some numerical experiments in Section 7. 

Notation. For x G R", we write the sum of the magnitudes of the k largest coefficients of x. 

When X G r™x"^ Xi is the i^^ row of X, \\X\\2 the spectral norm and ||X||i? the Frobenius (Euclidean) 
norm of X. For matrices A,B G jjmx"^ write A® B their Kronecker product and A o B their 
Schur (componentwise) product. We write NumRank(X) the numerical rank of the matrix X, with 
NumRank(X) = and NumCard(x) is the numerical cardinality of a vector x, with 

NumCard(2;) = /||x||2. Finally, we write x <cy when E[/(x)] < E[/(y)] for any convex function 
/ : R" ^ R. 



2. Weak recovery conditions 

To highlight the central role of the NSP condition in li decoding, we begin by adapting a result from 
Cohen et al. (2009, Th. 4.3) which uses the constant ak to bound the reconstruction error when decoding 
the observations Ae by solving problem (^i-recov.). Recall that x'p is the solution to the hnear program 
in (£i-recov.) and e the true signal. 

Proposition 2.1. Suppose that ||x^P — e||fc,i < UkWx^"^ — e\\i for some ak < 1/2, where e G andx^^ G R" 
solves problem (£i-recov.), then A{x^^ — e) = and 

IkP-elli < — r mm|j/gRn.(-;ard(j/)<fc} l|y - e||i, (1) 



where the right-hand side is proportional to the best ii reconstruction error on e using a signal with cardi- 
nality k. 

Proof. We adapt the proof of Cohen et al. (2009, Th. 4.3). Because x^p solves (^i-recov.), we have ||2;^p||i < 
||e||i since e is feasible. Denoting by T the indices of the k largest coefficients of e and by 77 = x^^ — e the 
reconstruction error, we write 

II ipii I II ip II ^11 II I II II 
\\Xrp\\i + llx^dli < IIctIIi + IIctHIi 

and triangular inequalities yield 

1 1 111 ~ II "Ht 111 + II ^T'= 111 ~ II CT'= 111 ^ II ct 111 + II CT'= 1 1 1 

hence 

||f/T'=||i < Ihrlli + 2||eT': Ill- 
Note that by definition of T, we have Heyc ||i = T^^^{yeR'^:Card{y)<k} Wu ~ e||i- From our assumption on r] 
and by definition of || • ||fc 1, |T| = k means 

\\i1t\\i < \\il\\k,i < akMli = afc(||??T||i + ll^/T'^lli) 



hence 

which then yields 
Using the fact that 



||f?T||l < II^tHIi 

I^/tHIi < _ 2^^-^ min{yeR":Card{j/)<fc} \\y - e||i. 



??||i = WvtWi + ||f/T=||i < 1 + ~i II^tHIi 



1 - "fc 

we get \\r]\\i < \\r]T':\\i/{l — afc), which produces the desired result. ■ 

This last result shows that whenever the reconstruction error satisfies (det-NSP) with constant < 1, 
then the magnitude of this error is at most 2/(1 — q^) times the best possible reconstruction error achievable 
using a signal of size k. 

2.1. Gaussian model. In what follows, we will use concentration inequalities to bound both sides of the 
probabilistic NuUspace Property inequality (proba-NSP), namely check that 

l|i^2/IU,i < «fcll^y||i 

holds with high probability when y is Gaussian with y ~ AA(0, 1^), where F is a basis of the nuUspace of 
A. Of course, this means that we implicitly assume that the reconstruction error x'p — e follows a Gaussian 
model. Outside of tractability benefits, there is no fundamental reason to pick a Gaussian distribution on the 
nuUspace of A here, except that its rotational invariance means the basis matrix F only has to be defined up 
to a rotation. This is consistent with the fact that recovery performance, as characterized by the nuUspace 
property (det-NSP), is only a function of the nuUspace of A and not of its matrix representation. Con- 
centration inequalities on Lipschitz functions of Gaussian variables then translate (proba-NSP) into explicit 
conditions on the matrix F. 

We begin by the following lemma controlling the left-hand side of this inequality. 

Lemma 2,2, Suppose F G R"-x™- and y ~ J\f{0, Im), then 



Prob [\\Fy\\k,i > n\\Fy\\k,i] + x] < e ^ 

where 

T ( 1-1 



al{F)^ max { ] ] ] (g) FF^u. (2) 

{«e{o,i}2",i^«<fc} V -1 1 / 



and 



E[||Fy|U,i] <a,(F)./21og(2'=©) <a,(F)./2A;(l + log(f )) 



Proof. We can write the left-hand side of inequaUty (proba-NSP) as 



max (n+ — u_) Fy 

{u=(n+,M_)e{0,l}2",lT„<fc} 



which means that ||-Fy||fc,i is the maximum of Gaussian variables. Concentration results detailed in (Massait, 
2007, Th. 3.12) for example show that 



Prob [\\FyU,i > E[\\Fy\\k,i] + x] < e '^t^^' 



where ak{F) is defined as 



max E 

{w=(M+,M_)e{0,l}2",lTn<A;} 



U- 



^Fy)' 



We have 



E 



((n+ 



^Fy)' 



---Yfw; 

u^fFF^{u+ 
FF^ 



-FF 



•T 



-FF^ 



and we recover (2) after setting u = {u^,u-). Note that we also have 



ll^y||fc,i = max V Fy, 

{f6Vfc} 

where Vk is the set of vectors of size n with exactly k entries equal to +1 or —1, and n — k zeroes. Each 
Fy is Gaussian with zero mean and variance FF'^v, so ||-F2/||fc,i is the maximum of 2'^(^) Gaussian 
random variables. Using (Massart, 2007, Lem. 2.3) we can therefore bound the expectation as follows 



E[||Fy|U,i]<a,(F)J21og(2fc(^)) 



and 



yields the desired result, i 



n (ne 
< — < — 
- k\ - \ k 



Note that the bound in exp(— /2al{F)) can be replaced by 2{l-N{x/ak (-F))) (see e.g (Massart, 2007, 
Thm 3.8), where N{x) is the Gaussian CDF, which is smaller for larger values of x. Expression (2) means 
(y\{F) is the optimum value of a fc-Dense-Subgraph problem. Several efficient approximation algorithms 
have been derived for this graph partitioning problem and will be discussed in Section 4. We now apply 
similar concentration results to control the fluctuations of the right hand side of inequality (proba-NSP). 

Lemma 2.3. Suppose F £ R'^^™ and y ~ J\f{0, Im), then 

Prob < E[||Fy||i] - x] < e 



where 



nWFyh 



IV" \\F 



i\\2 



and L^{F) = maXj,g|_i ij.n V FF v (= a^{F)) is bounded by the following MaxCut relaxation 



2t2 

7r mxct 



{F)<L^{F)<Ll^,,{F)^ max. Tr(XFF^) 

s.t. diag(X) = 1,X ^ 0, 



(3) 



with, in particular, L^^ct{F) < \/n||-F| 



Proof. We can write 

= max Fy 

i>e{-i,i}" 

and (Massart, 2007, Th. 3.12) shows that 



Prob[||Fy||i <E[||Fy||i]-x] <e 

The fact that 'Ej\\g\] = \/2/-kV whenever g ~ AA(0, y^) produces the expectation, and the Lipschitz 
constant L?'{F) in this inequality is given by the largest variance 

L^{F) = max v^FF'^v, 
t)e{-i,i}" 

hence is the solution of a graph partitioning problem similar to MaxCut. Relaxation results in (Goemans 
and Williamson, 1995; Nesterov, 1998a) show that this combinatorial problem can be bounded by solving 

maximize Tr(XFF^) 

subject to diag(X) = 1, X ^ 0, 

which is a semidefinite relaxation in X G S„ of the maximum variance problem (tight up to a factor 7r/2). 
Its dual is written 

minimize l^w 

subject to FF^ ■< diag(tt;), 
which is another semidefinite program in the variable it; G R". By weak duality, any feasible point of this 
last problem gives an upper bound on Lmxct(^)- In particular, the point w = Xyaax{FF^)l is dual feasible 
and yields L^^ct{F) < ^"^11-^ lb- ■ 

The bound detailed in Lemma 2.3 is directly related to the MatrixNorm problem discussed in Ne- 
mirovski (2001) and Steinberg and Nemirovski (2005) or the spin glass models of statistical mechanics. In 
particular, our approximation bound on L{F) can be directly deduced from the bound on the induced matrix 
norm || • ||2,i derived in Steinberg and Nemirovski (2005, Prop. 1.4). Note also that the mean E[||Fy||i] = 
\/^/t^Y17=i ll^ilb is typically much larger than the factor L{F) controlling concentration. In fact, we can 

write ^"^1 \\Fi\\2 = ||F||irNumCard({||Fi||2})^/^ = ||F||2NumRank(F)i/2 NumCard({||F,||2})^/2. 
Combining the last two lemmas, we show the following proposition, which is our main recovery condition. 

Proposition 2.4. IfFeR"^^"^ satisfies 





1 + +/3j afe(F) < \^^=-^jFi\\2-l3L{F)jak (4) 

for some /3 > 0, where Ok^F) was defined in (2) and L(F) in (3), then the sparse recovery condition 
(proba-NSP) will be satisfied with probability 1 — 2e~^ when y ~ M{0, Im). 

Proof. We combine the bounds of Lemmas 2.2 and 2.3, requiring them to hold with probability 1 — /3. ■ 

We finish this section by showing that the function crfc(F) defined in (2) is increasing with k, which will 
prove useful in the results that follow. 

Lemma 2.5. Let F € R"^™^ the function (Tk{F) is increasing in k G [1, n], with 

ai{F) = max (FF^) a and an{F) = L{F) 

i=l,...,n 

where L{F) is defined in Lemma 2.3. 
Proof. We can write 

al(F) = max 11 (u+ — n_)'^F||^ 

{(«+,«_)G{0,l}2",l^«<fc} 

= max (vv^ o FF'^)u. 

{D6{0,l}",l^D<fc,Me{-l,l}"} 
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Let us call v{k), u{k) the optimal solutions of the maximization problem with optimal value al{F), and let 
J = {i ^ [l,n\ : v{k)i ^ 0} be the support of v{k). If we pick i G [1, n], outside of J, we have 

> u{kf{v{k)v{kfoFF^)u{k) + {FF^)ii+ max 2ui I ^ u{k)j{FF^)iA 



al{F) + {FF^)u + 2 



Y,u{k),{FF%, 



Hence the difference between a1_^-^{F) and o-'^{F) is at least maxjg This means that ak{F) is 
increasing and bounded by 

max u^FF^u, 

ue{-i,i}" 

which is the maximization problem defining L'^{F) in Lemma 2.3. ■ 

2.2. Independent, bounded model. The previous section showed that enforcing condition (proba-NSP) 
with high probability for Gaussian vectors y meant controlling the ratio between the Lipschitz constant 
(Tfc(F) of the norm ||-Fy||i ^ and the norm X^"^]^ ll-^ilb- In what follows, we will show that the same 
quantities control the concentration of ||-F?/||i,a: and ||-Fy||i when the coefficients of y are independent and 
bounded. Once again, because F is defined up to a rotation here, these results are easily extended to the 
case where y = Qu with Q^Q = I and the variables u are independent and bounded. We can write a weak 
recovery condition for this bounded model, similar to condition (4). 

Proposition 2.6. Let F g R"^"* and suppose 

E[||Fy||i,fc] + /3c7fc(F) < (E[\\Fy\\i] - f3L{F))ak (5) 

for some /? > 0, where (Tk{F) was defined in (2) and L(F) in (3), then the sparse recovery condition 
(proba-NSP) 

\\Fy\\kA < ak\\Fy\\i 

will be satisfied with probability 1 — 2ce~^^^'^^^ , where c> is an absolute constant, when the coefficients 
ofy(z R™ are independent and bounded, with ||y||oo 

< A. 

Proof. As pointwise supremums of affine functions in y, the functions ||i^y||i,A: and ||-Fy||i are convex and 
Lipschitz with constants bounded by crfc(F) and L{F) respectively (see the proofs of Lemmas 2.2 and 
2.3). If the coefficients of y G R"* are independent and bounded, with ||y||oo < ^> Talagrand's inequality 
(Ledoux, 2005, Corr. 4.10) then shows that 



Prob [| n\\Fy\\i,k] - \\Fyh,k\ >t\<Ce ^^l^ 

and 

Prob [I E[||F2/||i] - ||Fy||i| > t] < Ce'^^^^ 
where c is an absolute constant, hence the desired result. ■ 

The parallel with the Gaussian case can be made even more exphcit using the following simple majoriza- 
tion result. 

Lemma 2.7. Let V C R" be a finite set. Suppose the variables {yi}«=i,. ^''^ independent with support 
in [—1, 1], then 

E[sup„gy v^y] < cTy^ vr log |y I 

where a = max„gy lb lb- 



Proof. If the variables in are independent, supported in [—1, 1], then y 9 where g ~ M{0, fin) is a 
Gaussian vector (Ben-Tal et al., 2009, Prop. 10.3.2). The supremum sup^gy v^y is a pointwise maximum 
of affine functions of y, hence is convex in y, so y g implies E[supj,gy v^y] < E[sup„gy v^g]. Finally, 
(Massart, 2007, Th. 3.12) shows that E[sup„gy v^g] < crY^vrlog \V\. m 

If we take V in Lemma 2.7 to be the set of vectors of size n with exactly k entries equal to +1 or —1, and 
n — k zeroes, this result shows that, when the coefficients of y are supported on [—1, 1] and independent, 
then E[||F2/||i^fc] is bounded by f E[||F(7||i^fc] with g Gaussian. Alternatively, both expectations in (5) can 
be evaluated efficiently. In fact Hoeff ding's inequality shows that if we need to estimate these quantities 
with precision e and confidence 1 — /3, we need at least N samples of either or with 

DHog{2/f3) 

where D = max||j^||_^<^ is an upper bound on both norms whenever ||y||oo < A. 

3. Weak recovery and restricted isometry 

In this section, we study the limits of performance of condition (4). We first show that matrices that 
satisfy the restricted isometry property defined in Candes and Tao (2005) at a near-optimal cardinality k 
also satisfy our weak recovery condition (4) for similar values of k. The key difference of course is that we 
will see that the conditions detailed here can be tested efficiently. 

Following Candes and Tao (2005), we will say that a matrix A E r»^x" satisfies the restricted isometry 
property (RIP) at cardinality /c > if there is a constant 6k > such that 

\\x\\lil-6k) < \\Ax\\l < {l + 6k)\\x\\l 

for all sparse vectors x S R" such that Card(x) < k. We now show that the RIP allows us to closely 
control the values of ak{F) and L{F), hence prove that F satisfies the weak recovery condition (4). 

Lemma 3.1. Suppose the matrix F^ G jjmxn ^qii^ji^^ ^/jg restricted isometry property with constant 5^ > 
at cardinality k, then 

(TkiF) < y/k{l + 6k) and \\Fi\\2 > x/l-<5i. (6) 

and{k/nfL'^{F) < al{F). 
Proof. We get 

o-liF) = max 11 (u+ — u_)'^F||^ 

{(«+,«_)e{o,i}2",i2'«<fc} " 

= max (u-|_ — u^)^ F F^ [uj^ — m_) 

{(«+,«_ )e{o,i}2",iru<fc} 

< {I + 6k)\\u+ - u4\l 

< {l + 6k)k 

because satisfies the RIP and Card(ii+ — u_) < k. Plugging Euclidean basis vectors in the RIP also 
means (1 — 6i) < \\Fi\\2 for i = 1, . . . , n. Lemma 2.5 showed that L{F) = an{F) and combining this with 
the lower bound in (Srivastav and Wolf, 1998, Lem.l) on the performance of the greedy algorithm in §4.2.1 

u 



shows that {k/nfL'^{F) < crl{F). 



This last result allows us to show that F satisfies the weak recovery condition in (4) at cardinalities near k 
whenever satisfies the RIP at cardinality k. In other words, this result shows that our weak recovery 
condition is indeed weaker than the restricted isometry property. This is what we do now. 

Proposition 3.2. Let m = p.n and k = Km log^ ^ {n/ k) for some fi,K E (0, 1). Suppose F'^ G jjmxn 
satisfies the restricted isometry property with constant 6^ with Q < 5^ < c < 1 at cardinality k, where c is 
an absolute constant, then F satisfies condition (4) for n large enough. 



Proof. When satisfies tlie RIP, Lemma 3. 1 above siiows 



and, using L{F) < {n/k)ak{F) (see Lemma 3.1), we then get L{F) < nk'^/'^ ^/(T+Jk). Therefore, 



^ ^ ||F,||2 - /3L(F) > ^il-^ - f]ny^{l + 6k)/k 
1=1 

for any /3 > 0. We also note that 6i < 6k so that 

||F,||2 - /5L(F) > n-^SlZS _ f^ny/{l + 6k)/k. 

i=l 

When n is large enough, m = jin and k = Km\og'^ {n / k) mean 



|^y'2A; (l + log ^) + /3 j 7^1+^ < Y ^^^^ - + na, 

hence F satisfies condition (4) whenever = 0(1), since the two terms above are respectively an upper 
bound on the right-hand side of condition (4) and a lower bound on the left-hand side of condition (4). ■ 

4. Bounds on L{F) and ak{F) using graph partitioning relaxations 

In Section 2.1, we showed that if the matrix F G R^x"* satisfied the weak recovery condition (4), which 
read 



2k log (l + x) + j ^'^(^^ - (\/f ^ " ^^^^7 

for some /3 > 0, then the recovery condition in (proba-NSP) would be satisfied with probability 1 — 2e~'^^/^ 
when y is Gaussian. Testing this weak recovery condition essentially hinged on bounding the Lipschitz 
constants ak{F) and L{F). In Section 2.2 we showed that the same quantities allowed us to check the 
weak recovery condition in a more general model where y is bounded. As we will see below, efficient 
approximation results on these graph partitioning problems produce relatively tight bounds on both ak{F) 
and L{F). In particular, these bounds are tight enough to allow condition (proba-NSP) to be tested in 
polynomial time at near-optimal values of the cardinality k. 

4.1. Bounding L{F): MaxCut. We have observed in Lemma 2.3 that the constant L{F) on the right hand 
side of condition (4) is defined as 

L^{F) = max v^FF'^v. (7) 

This is an instance of a graph partitioning problem similar to MaxCut. Goemans and Williamson (1995) and 
Nesterov (1998a) show that the following relaxation 

< LLct(i^) = max. Tr(XFF^) 

s.t. diag(X) = 1,X ^ 0, 

which is a (convex) semidefinite program in the variable X G S„, is tight up to a factor tt/2. This means 
that 2 /TrLmxct (F) < F{F) < Lmxct(^)- The dual of this last program is written 

minimize l^w 

subject to FF^ ^ diag(t(7), 

which is another semidefinite program in the variable w G R". By weak duality, any feasible point of this 
last problem gives an upper bound on L{F). 
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4.2. Bounding (Tfc(F): k-Dense-Subgraph. On the left hand side of (4), the constant (y'j,{F) is computed 
as 

0-2 (F) = max. u^Mu 

s.t. l^u < k (9) 

in the binary variable u, where M G is positive semidefinite, with 

M = ( -\ )^FF^, (10) 

here. This is a graph partitioning problem known as k-Dense-Subgraph, which seeks to find a subgraph S 
of the graph of M, with at most k nodes and maximum edge weight -^U ' Kortsarz and Peleg 

(1993); Arora et al. (1995); Feige et al. (2001); Feige and Langberg (2001); Han et al. (2002a); Billionnet 
and Roupin (2006) among others for details. Note that in our application here, M is typically dense and 
its coefficients can take negative values while most of the references cited above consider graphs with non- 
negative (often sparse) weight matrices. The /c-DenseSubgraph problem can also be seen as an instance 
of the Quadratic Knapsack problem (see Lin (1998); Pisinger (2007) for a general overview). We will see 
that elementary greedy or random sampling algorithms already produce satisfactory approximations. How- 
ever, their crudeness means that they are outperformed in practice by linear programming or semidefinite 
relaxation bounds, and we begin by outlining a few of these relaxations below. 

4.2.1. A Greedy Algorithm. We now recall the greedy elimination procedure described by e.g. Srivastav and 
Wolf (1998), which extracts a /c-subgraph out of a larger graph containing the optimal solution. Suppose 
we are given a weight matrix M G S„, and assume we know an index set / G [l,n\ such that the weight 
w{I) = J2i ji^i of the subgraph with vertices in / is an upper bound on o"^(F) of the fc-Dense-Subgraph 
problem in (9). If |/| < k, then / is optimal, otherwise we can greedily prune |/| — k vertices from the graph 
and Srivastav and Wolf (1998, Lem.l) show that the pruned subgraph must have weight at least 

When the weight matrix M is nonnegative, the full graph weight n]) produces an obvious upper bound 
on w{I*). The situation is slightly more complex when M has negative coefficients, as in the particular 
instance considered here in (2). In Proposition 6.1, we show how to produce an upper bound w{I) by 
solving the MaxCut relaxation (8). 

4.2.2. Semidefinite Relaxation. Many different relaxations have been developed for the -Dense-Subgraph 
and Quadratic Knapsack problem and we highlight some of them in what follows. Semidefinite relaxations 
were derived in Helmberg et al. (2000) to bound af,{F). In particular, the SQK2 relaxation in Helmberg 
et al. (2000) yields 

al{F) < max. Tr(MX) 

s.t. 1^X1 <k'^ (11) 
X -diag2(X) >z 0, 

which is a semidefinite program in the variable X G S„. Note that the constraint X — dia.g^{X) is a Schur 
complement, hence is convex in X. Adaptively adding further constraints as in Helmberg et al. (2000) can 
further tighten this relaxation. In particular, adding constraints of the type 

n n 

Xi, < kXii or ^(X,-, - Xij) < (1 - Xii) (12) 

for some i = 1, . . . , n, sometimes significantly improves tightness. Another simple relaxation formulated 
in Helmberg et al. (2000) bounds (7) when > 2 by solving 

0-2 (F) < max. Tr(MX) 

s.t. Tr((ll^ - I)X) < A;(A: - 1) (13) 
X - diag2(X) >z 0, 
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in the variable X G S„. This last relaxation is tighter than (11) but not as tight as its refinements using the 
additional constraints in (12). Another relaxation detailed in Feige and Langberg (2001) first writes (9) as a 
binary optimization problem over {—1,1}", then bounds it by solving 

maximize Tr(M(ll^ + yl^ + ly^ + Y)) 

subject to Yl = y(2k - n) (14) 
diag(y) = i,y ^ 0, 

which is a semidefinite program in the variable y G S„. We refer the reader to Helmberg et al. (2000) for 
details on the tightness and complexity of these various semidefinite relaxations. 

Fortunately, even though the fc-Dense-Subgraph problem is NP-Hard, simple randomized or greedy algo- 
rithms reach good approximation ratios (Arora et al. (1995) even produced a PTAS in the dense nonnegative 
case). While many tightness results have been derived on the semidefinite relaxations detailed above (see 
e.g. Han et al. (2002b)), most of them producing approximation ratios of fc/n or better, existing results do 
not apply when the coefficients of M have arbitrary signs. Here, we show a similar approximation ratio 
when the graph weight matrix M is allowed to have some negative coefficients but is positive semidefinite. 

Proposition 4.1. Suppose M G is positive semidefinite. Define 

VdM) = max Mu, 

the relaxation 

SDPk{M)= max. TrMX 



s.t. < Xij < 1 (15) 
TrX = k, X ^0, 



satisfies, for n large enough and k > n 



1/3 



where 



-fi{n,k) (lTrMG + ^SDPk{M) ] < Vk{M) < SDPk{M), 




and Gij = y^XaXjj, i,j = 1, . . . , n, so in particular Tr MG > 0. 

Proof. We use a hybrid randomization procedure, mixing the sparse sampling strategy in Feige and Seltser 
(1997) with the correlation argument in Goemans and Williamson (1995) and Nesterov (1998a). Let X be 
an optimal solution to problem (15), w.l.o.g. we can assume \Xii\ > 0, and we define the corresponding 
(positive semidefinite) correlation matrix Gij = Xij/ y/XaXjj, i,j = 1, . . . n and sample vectors z ~ 
AA(0, G). For each sample z, we define 

^ r 1 if > 0, 
^* \ otherwise. 

As in Feige and Seltser (1997), we also sample independent variables u G R" such that 

1 with probability qi = ky/Xu/S, 
otherwise. 

where S = ^21=1 V^u- Note that < < 1 because < Xa < 1 and J2i = k. For each sample, we 
then define w G {0, 1}", with Wi = Uiyi, i = 1, . . . , n, so when i / j 

ElwiWj] = Prob[2;j > 0, Zj > 0, Uj = Uj = 1] 

= Prob[zi > 0, Zj > 0] Prob['Ui = 1] Prob[uj = 1] 
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and E[wf] > Prob[zi > 0] Prob[ni = l]^. If we define G G S„ witli dj = yO(~X~, we conclude tliat 

B[ww'^] h^[\G+^ arcsin(C) o G] . 
Because X,M y with Tr X = k, we have S < \/kn, and we thus obtain 

^2 /I 1 

^[w'^Mw] > ( ^ Tr MG + — Tr(Af(arcsin(C) o G)) 

> - (^TrMG + ^SDPkiM) 
n \4 zvr 

because arcsin(C) t C (Nesterov, 1998b, Corr. 3.2), Tr(M(arcsin(C) o G)) = Tr(arcsin(C)(M o G)), 
C o G = X and M, C, G ^ so M o G ^ 0. Now, let us call h = Vroh[w'^Mw < E[w'^Mw]/l3] for 
some /3 > 1. By construction, because w'^Mw < SDPn{M) whenever w G {0, 1}" and 

w Mw< 1^^TM^< E[w'^Mw]/^} + S^Pn{M) 1^^^^,^^ E[w^Mw\/P} 

we have 

E[uFMw] < bE[w^Mw]/(3 + (1 - h)SDPn{M) 

so 

^-1 



l3SDPn{M)/E[w'^Mw] -l 
Now, let us call y G S„ a solution to SDPn{M); then kY/n is a feasible point of (15), so SDPn{M) 
Tr MY < MX and the previous paragraph shows 

SDPn{M) _ Tr MY ^ Inn Tr MY ^ 
E[w'^Mw] ~ E[w'^Mw] ~ kTrMX " k^ ' 
Therefore, for n large enough, setting 



1 _ 2pig-fci/3/3 ' 



ensures 



1 _ p-fci/V3 

/3> 



^ _ SDP„(M) g_fci/3/3 ■ 

E[w^'Mw] 

When the denominator is positive, the previous inequality implies that 

pSDPr,{M)/E[w'^Mw] - 1 
Hence, choosing again n large enough to make the denominator positive, we finally have 

1 _ 6 > ^ > e-'"^'/' 

~ /3SDPn (M) / E [w^Mw] - 1 

Now, using Chernoff's inequahty as in (Feige and Seltser, 1997, Lem. 4.1) produces 

Prob [Card(u) - l^q > tl^q] < e ~, 

where qi = Proh[ui = 1]. We note that here l^q = k and as in (Feige and Seltser, 1997, Th. 4.1), when 

k > 

Prob Card(n) > k (^1 + yt^^/^) < e-'^'^'/^ 
This last result, together with the bound on b derived above, shows that 

Proh[w'^ Mw > E[w^Mw]/l3] = l-b> e-^^'"" 1'^ > Prob [Card(u;) > k {l + k'^/^)] . 
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Therefore, by sampling enough points w, we can generate a vector wq € {0, 1}" such that 

w^MwQ > ( 7 Tr MG + ^SDPk{M)] and Card(u7o) <k(l + k~^/^ 
pn y4 zvr J V 

If we remove no more than k'^^^ variables from wq using the backward greedy algorithm described in 
Srivastav and Wolf (1998, Lem.l) we loose at most a factor 

k(k-l) 2 f 1 



(fc + A:2/3)(ju + A:2/3 - 1) fcl/3 \k^/3 

and, from wq, we obtain a point Wk such that 

wlAwk ^ (l - ^) Q TrMG + ^5m(M)j and CardK) < k, 
when n is large enough, which yields the desired result. ■ 

Note that, in the previous result, the condition k > n^/^ can be replaced by any constraint of the type 
k > where < a < 1 with n^/^ replaced by n"/'^. 

4.2.3. Sparse eigenvalues and k-Dense-Subgraph. The algorithms listed above (in 4.2.2) suggest that ap- 
proximating the fc-Dense-Subgraph problem is significantly easier than testing RIP or the nullspace property. 
In fact, there is an interesting parallel between the sparse eigenvalue and A; -Dense-Subgraph problems. The 
fe-Dense-Subgraph problem used in bounding ak{F) is written 

(jI(F)= max u^Mu where M=( ] ] (g) FF'^ 

l^u<k 

in the variable u € {0, l}^". On the other hand, the problem of computing a sparse maximum eigenvalue to 
check the restricted isometry property can be written 

X^i^^{FF'^) = max max {FF'^ o xx^)u 

ng{0,l}" |la;||=l 
l^u<k 

in the variables x G R", u G {0, 1}". We observe that computing sparse eigenvalues (for testing RIP) 
means solving a A;-Dense-Subgraph problem over the result of an inner eigenvalue problem in x, while 
bounding ak{F) only requires solving a A;-Dense-Subgraph problem over a fixed matrix M, hence is signif- 
icantly easier. 



5. Complexity 

Bounding L{F) and ak{F) using semidefinite relaxations means solving two maximum eigenvalue min- 
imization problems. Problem (8), used for bounding L{F), can be rewritten 

mill nXjaa,x{FF^ — diag(t(;)) — I'^w (16) 
while problem (11) bounding (Jk{F) can be written 

mill {k + l)Xn^^^(F + wH + zG + y^y^Ei]-wk{k-l)-z (17) 



where 



FF^ 0\ 0\ r^_fO 0\_^^_f e.ef -e^/2 
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where Cj G R" is the i^^ Euclidean basis vector. Given a priori bounds on the norm of the solutions, Nesterov 
(2007) showed that solving problems (17) and (16) up to a target precision e using first-order methods has 
total complexity growing as 



O y — I o 

for problems (16) and (17) respectively. 

6. Tightness 

Below, we use the result of Proposition 4.1 to show that if a matrix F satisfies the weak recovery con- 
dition (4) up to cardinality k* , the semidefinite relaxation will allow us to certify that F satisfies (4) at 
cardinalities very near k* . 

Proposition 6.1. Suppose the matrix F G r^x'" satisfies the weak recovery condition (4) up to cardinality 

k* = j{n)nfor some ^{n) G (0, 1), (3 > and a^* G [0, 1], i.e. 



y 2k* log'^ + 13 j ak*{F) < |^yi^||F,||2-/3L(F)j a^., 
and let SDPk{-) be defined as in ( 15), we have 



1=1 



for n sufficiently large, when k < 7(72) (log n) ^k*, with M defined as in (10). 
Proof. Applying the result of Proposition 4.1 at cardinality k* shows 



Using SDPk{M) < SDPk*{M), with k < -f{n){logn)~^k* showing 



1/2 



2k* log I? + /3 



k* V k*^/^ 



l + TTTTk =0(1) 



when n — )■ 00, yields the desired result. 1 



7. Numerical Results 

We start by studying the distribution of the residual error x'p — e when e is a random sparse signal. 
We sample a thousand vectors e G R^'^° with 15 nonzero i.i.d. uniform coefficients. Our (fixed) design 
matrix A G R™^" is Gaussian with m = 30. We produce a vector of observations Ae and solve the £1 
reconstruction problem in (^i-recov.) and record the value of x^^ — e projected along a fixed (randomly 
chosen) direction v. The histogram of these values is plotted in Figure 1. 

We then sample Gaussian matrices of increasing dimensions n x n/2 and plot the mean values of the re- 
laxation bounds on L{F) (blue circles), ak{F) (brown diamonds) together with Yl^=i 11-^* II 2 (black squares). 
These quantities are plotted in loglog scale in Figure 2 on the left. As expected, the norm grows as n while 
both ak{F) and L{F) grow as ^/n. In Figure 2 on the right we plot the empirical (brown squares) versus 
predicted (blue circles) probability of recovering signals e, where F G R"^™ is a Gaussian with n = 300 
and m = n/2, for various values of the relative cardinality k/m. The empirical probability was obtained by 
solving (fi-recov.) over one hundred random sparse signal e G R^'^'^ with 15 i.i.d. uniform coefficients. The 
predicted probabihty is obtained by computing /3 from condition (4) after bounding L{F) and (Jk{F) using 
the convex relaxations detailed in Section 4. 

14 




Projected Error 

Figure 1. Projected reconstruction error v'^{x^^ — e), along a fixed (randomly chosen) 
direction v, using a single Gaussian design matrix with p = 100, m = 30 and a thousand 
samples of a random sparse signal e G R^'''^ with 15 i.i.d. uniform coefficients. 




Leading dimension n Relative Cardinality k/m 

Figure 2. Left: Loglog plot of mean values of L{F) (blue circles), (Jk{F) (brown dia- 
monds) and X]"^]^ ll-^ilb (black squares) for Gaussian matrices of increasing dimensions n, 
with m = n/2. Right: Empirical (brown squares) versus predicted (blue circles) probabil- 
ity of recovering the true signal e, where F G R"^™ is Gaussian with m = n/2, for various 
values of the relative cardinality k/m. 
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8. Appendix 

Gaussian matrices are known to satisfy the recovery condition (det-NSP) with high probability for near- 
optimal values of k hence obviously satisfy (proba-NSP). Here we directly verify that these matrices satisfy 
condition (4) w.h.p. without using RIP. Concentration inequalities have been used in Baraniuk et al. (2008) 
to derive a simple proof that some classes of random matrices satisfy RIP, we use similar techniques on the 
weak recovery property (4) here. 
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< e 2n 



We start by bounding the fluctuations of the right hand side of inequality (4) when F E R" is a 
Gaussian random matrix with Fij ~ AA(0, \/m). 

Lemma 8.1. Let F e R"""™ with i.i.d Fij ~ Af{0,l/m), 

n 

5^E[||F,||2] =n(l + 0(m-i)) 
1=1 

as m ^ CO. 

Proof. In this setting, each -y/m||Fj||2 is x distributed with m degrees of freedom, so 

n\\FM = 4l^-^^. ^ = l,...,n. 
Using Stirhng's formula (Abramowitz and Stegun, 1970, §6.1.37), we get 

r((m + l)/2) ^ exp(ni+llog(l^)-^log(^)) 
r(m/2) y^e{l + 1/m) 

"^+1 ^/ -l/2\ 
V + ^ ) 

as m — )• oo, which is the desired result. ■ 

We now use concentration inequalities to bound Yli^=i II II 2 in condition (4) with high probability when 
Fij ~ AA(0, 1/m). 

Lemma 8.2. Let F G R"^"" with i.i.d Fij ~ AA(0, 1/m), 

' n n 

Prob 5^||Fi||2<5^E[||Fi||2]- 

.i=l i=l 

Proo/ For any U,V G jjmxn^ j^^^g 

n n 

5^||C/i||2- ll^db < 11^^-^*112 
1=1 1=1 

so Y17=i ll-^ilb is ^ Y^n/ m-Lipschitz function (w.rt. the Euclidean norm) of nm i.i.d. Gaussian variables 
Fij /\/m ~ J\f{0, 1) and (Massart, 2007, Th. 3.4) yields the desired resuh. ■ 

We now turn to the left-hand side of inequality (4) and produce inequalities on crfc(F), using again the 
fact that it is a Lipschitz function of F. 

Lemma 8.3. Let F £ R"^™ with i.i.d. Fij ~ AA(0, 1/m), 

Prob [ak{F) > B[ak{F)] + x] < e""^ 
Proof. We first note that the max is 1-Lispchitz with respect to the i^o norm on R". Indeed, if a, 6 G R" 

I max a j — max bj \ < max | Oj — bi\, 

i j i 

because 

Oj — max6j < Ui — bi < |aj — bi\ < max \ai — bi\, i = 1, . . . , n. 

j i 

Hence, maxj Oj — maxj bj < maxj | Oj — bi\. The two sequences play symmetric roles so we also have 
I maxj bj — maxj aj| < max^ |afc — bk\. Now our aim is to show that F — )• ak{F) is a Lipschitz function of 
F with respect to the Euclidian norm. The argument we just gave shows that if F and G are two matrices, 

\cJk{F)-ak{G)\< max \\\{u+ - u^)'^ F\\^ - \\{u+ - u^f G\\^\ , 

16 



because (Tk{F) and ak{G) are maxima of finite sequences. We now have 

\\\{u+-u^fF\\^-\\{u+-u-fG\\^\ < \\{u+-u^f{F -G)\\^ 

< \\iF-G)h\\iu+-u.fl 

< \\F -G\\F\\iu+-u^f\\^ 

which shows that 

ak(F) = max 11 (n+ — ti_)-^F|L 

{(M+,«_)e{0,l}2",lT'n<fc} " 

is a Lipschitz function of the entries of F (with respect to Euclidian norm). Now when the entries of F are 
i.i.d AA(0, 1/m), (Jk{F) is a Lipschitz function of standard Gaussian variables with Lipschitz constant 

— U- lln / k 
max — < \ — , 

{(n+,«_)e{o,i}2",iT'«<fc} y/m V m 

and (Massart, 2007, Th. 3.12) yields the desired result. ■ 

Next, to bound E[crfc(F)], we first show abound on the supremum of an arbitrary number of x distributed 
random variables. 

Lemma 8.4. Let {yjjjgT^ be x distributed variables with m degrees of freedom, then 

We note that the proof we present applies non only to x distributed random variables but more generally 
to Lipschitz functions of i.i.d normal random variables. 

Proof. Since yj's have the same mean, we have 

supyj = E[yi] + supjg2^(yi - E[yi]) . 

Here we know that E[yj] = ^ ^^^^ ^'^'^ know using Jensen's inequality that E[?/j] < 



The fact that a standard multivariate normal satisfies a log-Sobolev inequality (with constant 1 in the 
setup of Ledoux (2005, Chap. 5)) implies through the Herbst argument that any 1-Lipschitz function F 
(with respect to Euclidian norm) of i.i.d Gaussian random variables satisfies (see Ledoux (2005, Eq.5.8)) 

log^'(z) ^ logE[exp{z(F(X) - E[F(X)])}] < ^ . 

The previous inequality naturally applies to y^'s since a Xm random variable is just the norm of a Tri- 
dimensional vector with i.i.d entries (and the norm is 1-Lipschitz by the triangle inequality). 

Using a classic approach in probability, namely a "soft-max" inequality, the concavity of the log, the 
definition of '^{z) and the fact that the variables yi are identically distributed, we now have, if yi = j/j— E[yi], 

E[sup,eTy.] < iE[log(E,eTe^*0] 



< ilog(5]E[e 



eT 

^ iog|r| + iog^(z) ^ iog|r| + z72 



for any 2: > 0. Optimizing over z, we get that 



E[sup,gry,] < yj2\og\T\ . 

This gives the desired result. ■ 



17 



Let us now assume that the basis F € R"^*" is a Gaussian random matrix (hence A is impUcitly defined 
here as a matrix annihilating F on the left) with Fij ~ M{0, 1/m). As detailed below and throughout this 
appendix, standard concentration arguments allow us to directly show that F satisfies condition (4), without 
resorting to the restricted isometry property. We assume that m scales proportionally to n, with m = finas 
n goes to infinity. We also assume that k scales as nmum with Um when m and n go to infinity. 



Proposition 8.5. Suppose m = fin and k = Kmumfar some fi,K £ (0, 1), with Um — s- m — t- oo. 
Let F G jj"x™ J J J Gaussian random matrix with Fij ~ AA(0, 1/m) and 13 > 0, then F satisfies 

condition (4) with high probability as n goes to infinity. 



Proof. We first study the left hand side of (4), which reads 

+ /3 1 <Jk{F) < 




1 +log 



2n 



k 




i=l 



when n goes to infinity. Because ■s/rnjk 1 1 (u+ 
(«+,it_) G {0, 1}^" with l^M = k and nE^U- = 



E[afc(F)] 



E 
E 



ii_)" is X distributed with m d.f. whenever u 
0, Lemma 8.4 shows that for n large enough 



'^^^{u={u+,u-)€{0,l}'^",ll'u=k,ulu-=0} 



|(n+ - n_)^F 



< W — 



1 2k 



. , 2n 
l+log( — 



Here we have used the fact that the cardinality of the set T over which we are taking a supremum is such 
that log |T| < /c (1 + log (x^)), as shown in the proof of Lemma 2.2. We note that for a constant c > 0, 
we have A; log (^) < cmum log(l/tim) ^ ITT- Therefore, if c denotes a constant that may change from 
display to display (but does not depend on n or m), we have 



E[afc(F)] < cVk 



and 



'2A;log 



2n \ B[ak{F)] 



k 



m 



< C—^/-log{Um) < C^J-U^^ log(tim) > 



m 



where c > does not depend on n. For some arbitrarily small > 0, setting x = ^2kjm in Lemma 
8.3, yields 



Prob cjfc(F) > E[crfe(F)] + .JWJ^ 
We now focus on the right hand side of (4). Lemma 8.1 shows that 



< e" 



lim 



Er=iE[iiF,ii2] 



m 



n 1 
lim — = — . 

n->-oo rn ji 



because -v/m||^«ll2 is X distributed with m degrees of freedom. Setting 
yields 

l^^/2 



u+l 



Prob 



YJi=i \Wih ^ Ya=i e [ll^db] 



n 



m 



m 



3/2 



n 



< e 



/m in Lemma 8.2 then 



which, together with the inequality on the left hand side derived above, means that for n large enough, the 
matrix F satisfies condition (4) with probability at least 1 — 2e~"' " . Finally, with L{F)'^ < n||FF^||2, the 
fact that ||F||2 is 1-Lipschitz (with respect to Euclidian norm as a function of the (Gaussian) entries of F) 
combined with the bound on E[||F||2] detailed in Davidson and Szarek (2001, Prop. 2.14) shows that 



Prob 



\F\\2 <c+V2- 



n 



< e" 



for some absolute constant c > 0. This means that L{F)/m — when n goes to infinity and the second 
term of the right-hand side of (4) is then negligible compared to the first. ■ 

This last result shows that the sufficient condition in (4) is weak enough on Gaussian matrices to hold 
w.h.p. near optimal values of the cardinality where the number of samples m is almost proportional to the 
signal size. 
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